Goto

Collaborating Authors

 dense weakly-supervised category reconstruction


Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

Neural Information Processing Systems

We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate each image pixel with a deformation model of the corresponding 3D object point which is canonical, i.e. intrinsic to the identity of the point and shared across objects of the category. The result is a method that, given only sparse 2D supervision at training time, can, at test time, reconstruct the 3D shape and texture of objects from single views, while establishing meaningful dense correspondences between object instances. It also achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.


Review for NeurIPS paper: Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

Neural Information Processing Systems

This is true -- CMR did not backprop on texture loss. However this CVPR'20 work from Henderson et al. shows that you can https://arxiv.org/abs/2004.04180. This paper may not have been known to the authors (CVPR happened around the NeurIPS deadline), so I'm fine if they correct and discuss this point in the main paper. To me, it seems that these are the main differences between CMR, CSM, and the proposed approach: (i) CMR is akin to a direct method - backpropagation through the texture results in a photometric-like loss (it's not quite a photometric loss since a perceptual loss is used instead, but it's close enough); (ii) CSM learns to establish correspondences from image pixels to a fixed shape template that does not adapt to the depicted shape (their articulated-CSM follow-up CVPR 2020 paper allows the template to deform, but the shape deforms based on a semi-manually defined skeleton, which does not have the capacity to capture surface details); (iii) the proposed approach learns to establish correspondences from image pixels to the parameterized surface of a (C3DPO) shape basis that then deforms to the depicted shape. In the classical debate of direct versus correspondence methods, I view the proposed method as belonging to the latter camp. My hypothesis is, similar to how correspondence methods played out in the late 90s and 2000s, the proposed approach may be less susceptible to local minima than direct methods during shape-fitting optimization. But I think there's room to investigate this issue more fully, which may be outside the scope of this paper. Although I think (iii) is still a hybrid of CMR and CSM (but still with known keypoints). With that said, I'm changing my mind on this, I find this combination a reasonable idea.

  dense weakly-supervised category reconstruction, experiment, representation, (12 more...)

Review for NeurIPS paper: Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

Neural Information Processing Systems

How much of the improvement is coming from the implicit shape representation over meshes? The proposed approach of combining local and global information via CSM's consistency loss could have been done with meshes (CSM was mesh based). What does the result look like with meshes? Or CMR could have also used this implicit shape representation. This key ablative study is missing.


Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

Neural Information Processing Systems

We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate each image pixel with a deformation model of the corresponding 3D object point which is canonical, i.e. intrinsic to the identity of the point and shared across objects of the category. The result is a method that, given only sparse 2D supervision at training time, can, at test time, reconstruct the 3D shape and texture of objects from single views, while establishing meaningful dense correspondences between object instances. It also achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.